Improved Constituent Context Model with Features

نویسندگان

  • Yun Huang
  • Min Zhang
  • Chew Lim Tan
چکیده

The Constituent-Context Model (CCM) achieves promising results for unsupervised grammar induction. However, its performance drops for longer sentences. In this paper, we describe a general feature-based model for CCM, in which linguistic knowledge can be easily integrated as features. Features take the log-linear form with local normalization, so the Expectation-Maximization (EM) algorithm is still applicable to estimate model parameters. The l1-norm is used to control the model complexity, leading to sparse and compact grammar. We also propose to use a separated development to perform model selection and an additional test set to evaluate the performance. Under this framework, we could automatically choose suitable model parameters rather than setting them empirically. Experiments on the English treebank demonstrate that the feature-based model achieves comparable performance on short sentences but significant improvement on longer sentences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Grammar Induction Using a Parent Based Constituent Context Model

Grammar induction is one of attractive research areas of natural language processing. Since both supervised and to some extent semi-supervised grammar induction methods require large treebanks, and for many languages, such treebanks do not currently exist, we focused our attention on unsupervised approaches. Constituent Context Model (CCM) seems to be the state of the art in unsupervised gramma...

متن کامل

Transition-based Neural Constituent Parsing

Constituent parsing is typically modeled by a chart-based algorithm under probabilistic context-free grammars or by a transition-based algorithm with rich features. Previous models rely heavily on richer syntactic information through lexicalizing rules, splitting categories, or memorizing long histories. However enriched models incur numerous parameters and sparsity issues, and are insufficient...

متن کامل

Testing the Weighted Salience Model of Conceptual Combination

Testing the Weighted Salience Model of Conceptual Combination. (December 2003) Merryl Joy Patterson, B.S., Indiana University; M.S., Texas A&M University Chair of Advisory Committee: Dr. Steven M. Smith In two experiments the Weighted Salience Model (WSM) of conceptual combination was examined. Several of the hypotheses set forth in the WSM were evaluated, including the importance of salience o...

متن کامل

Adaptation of physical and non-physical dimensions with the components of Wall Painting in the urban space of Isfahan

     Physical texture in urban space has features such as space decoration, harmony, unity, continuity and spatial consistency in its constituent elements, which are among the most important factors in forming a favorable urban landscape.Wall painting, as one of the urban elements, has a direct relationship with the physical and non-physical dimensions and follows certain requirements in the de...

متن کامل

A Unified and Discriminative Soft Syntactic Constraint Model for Hierarchical Phrase-based Translation

In the last decade, there have been a countless number of researches in soft syntactic features many of which have led to the improved performance for Hiero. However, it seems that all the syntactic constituent features cannot efficiently work together in the Hiero optimized by MERT. In this paper, we propose a more general soft syntactic constraint model based on discriminative classifiers for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012